Data information

Sample names and data file paths visualized in this report:

sample1: /Users/kathrynwalters/Documents/RStudio Git Locations/2024_sBC_proteogenomics/input/Betacell_mergedBamsSorted.sortedByCoord.out.bam_results_RiboseQC

1 Read location distribution

Per sample, the distribution of reads across different originating compartment (e.g. cytoplasmic and organellar footprints) and biotypes (e.g. CDS regions of protein coding genes) is shown.

1.1 By biotype (and originating compartment)

sample1

1.2 By originating compartment (and biotype)

2 Read length distribution

Per sample, the distribution of read lengths is shown per originating compartment.

sample1

3 Read length and location distribution

Per sample and originating compartment, read length and location distributions are shown.

For each sample, absolute number of reads and normalized read length distributions are shown.

3.1 Read length distribution per biotype

Read count shows absolute read numbers; in Read count fraction the number of reads for each biotype sums up to 1.

sample1

3.2 Read biotype distribution per read length

Per read length, the read distribution for different biotypes is shown (stacked barplot). Read count shows absolute numbers; in Read count fraction, the number of reads for each read length sums up to 1.

sample1

4 Metagene analysis

Profiles of 5’ ends are displayed over a metagene plot aggregating signal over all covered transcripts. 5’end profiles are calculated with sub-codon resolution, and using binned transcript regions.

Different scaling methods can be applied to the calculated profiles. Profiles for individual read lengths (without scaling) can also be visualized.

Disclaimer:When comparing between samples, you might find differences in read lengths displayed, since read lengths are chosen for each sample individually.

4.1 5’ profiles

sample1

nucl

Select a resolution (subcodon or bins): In case of subcodon resolution, read coverage is shown for the first 25nt after the transcription start site (TSS), 25nt before and 33nt after the start codon, 33nt from the middle of the CDS, 33nt before and 25nt after the stop codon, and the last 25nt before the transcription end site (TES). In case of bins, read coverage is shown for 50 bins between TSS and start codon (5’UTR), 100 bins for the CDS, and 50 after stop codon (3’UTR).

Select a scaling method: none (no scaling), log2 scaling, and z-score scaling.

Note: Select subcodon / none or bins / none scaling to display the read coverage for each read length separately.

subcodon / none

all

21

25

26

27

28

29

30

subcodon / log2

subcodon / zscore

bins / none

bins / log2

bins / zscore

chrM

Select a resolution (subcodon or bins): In case of subcodon resolution, read coverage is shown for the first 25nt after the transcription start site (TSS), 25nt before and 33nt after the start codon, 33nt from the middle of the CDS, 33nt before and 25nt after the stop codon, and the last 25nt before the transcription end site (TES). In case of bins, read coverage is shown for 50 bins between TSS and start codon (5’UTR), 100 bins for the CDS, and 50 after stop codon (3’UTR).

Select a scaling method: none (no scaling), log2 scaling, and z-score scaling.

Note: Select subcodon / none or bins / none scaling to display the read coverage for each read length separately.

subcodon / none

subcodon / log2

subcodon / zscore

bins / none

bins / log2

bins / zscore

4.2 Calculation of P-sites positions

Read lengths, as well as their individual offsets, are selected according to the parameters specified in the Ribo-seQC run.

Note: Not all samples and originating organelles might be displayed here. Please check the parameters used in the Ribo-seQC run.

4.2.1 Per frame coverage

The fraction of 5’ends (from Section 4.1) falling on the three possible frames is displayed, for each read length and organelle. Each data point represents one transcripts.

sample1

nucl

4.2.2 Selected read lengths and cutoffs

Cutoffs and frame statistics are shown for selected read lengths:

  • cutoff: 5’ end cutoff used to infer P-sites positions
  • frame_preference: fraction of coverage in the frame with most reads
  • gain_frame_codons = gain of in-frame signal, averaged over all transcripts
  • gain_frame_new_codons: gain of in-frame signal on codons not covered by other read lengths, averaged over all transcripts

sample1

nucl

4.2.3 Choice of read lengths

Based on the parameters indicated in the Ribo-seQC run, the following read lengths (with their offsets) were selected to infer P-sites positions.

sample1

4.3 P-site profiles

Read coverage in form of P-site profiles is here displayed, with the same visualization options available in Section 4.2.

Note: Not all samples and originating organelles might be displayed here. Please check the parameters used in the Ribo-seQC run.

sample1

nucl

Select a resolution (subcodon or bins): In case of subcodon resolution, read coverage is shown for the first 25nt after the transcription start site (TSS), 25nt before and 33nt after the start codon, 33nt from the middle of the CDS, 33nt before and 25nt after the stop codon, and the last 25nt before the transcription end site (TES). In case of bins, read coverage is shown for 50 bins between TSS and start codon (5’UTR), 100 bins for the CDS, and 50 after stop codon (3’UTR).

Select a scaling method: none (no scaling), log2 scaling, and z-score scaling.

Note: Select subcodon / none or bins / none scaling to display the read coverage for each read length separately.

subcodon / none

all

25

27

28

29

30

subcodon / log2

subcodon / zscore

bins / none

bins / log2

bins / zscore

5 Top 50 mapping positions

In order to reveal possible contaminating sequences, the top 50 mapping positions (using 5’ends) are listed, together with genomic feature annotation and nucleotide sequences.

sample1

6 Top 50 abundant genes

The 50 genes with the highest read counts are listed below for (i) CDS regions of protein coding genes and for (ii) all genes.

CDS genes

sample1

All genes

sample1

7 Positional codon usage

Based on the P-site positions (Section 4.3), codon usage within CDS regions of protein coding genes is here shown. In addition, position-specific values are calculated for the first 11 codons of the CDS, 11 codons from the middle of the CDS, and for the last 11 codons of the CDS - those regions are referred to as start, middle, and stop, respectively.

Codon usage can be accessed with positional information, or summed up over all positions (Section 8 Bulk codon usage).

Codon counts shows codon occurences per each position; P-sites counts shows number of P-sites position mapping to each codon and position; P-sites per codon simply shows the ratio of P-sites counts over Codon counts. Same calculations are performed using A-sites (shifting P-sites +3nt) and E-site (shifting P-sites -3nt) positions. Such values are calculated for all read lengths, and also for individual read lengths (available in the full report). Different scaling methods are also available.

Note: The genetic code, which assigns amino acids to codons, can differ between organelles, species and originating genomes. Different scales are used for ATG/stop codons and other codons.

Note: Codon usage calculation is dependent on successful P-sites calculation.

sample1

nucl

all

Codon counts
none

log2

zscore

A-sites counts
none

log2

zscore

P-sites counts
none

log2

zscore

E-sites counts
none

log2

zscore

A-sites per codon
none

log2

zscore

P-sites per codon
none

log2

zscore

E-sites per codon
none

log2

zscore

8 Bulk codon usage

Codon usage (see Section 7) is here shown summed up over all CDS positions.

Note: Codons, as well as corresponding amino acids, are listed. The genetic code, which assigns amino acids to codons, can differ between organelles, species and originating genomes.

Note: Codon usage calculation is dependent on successful P-sites calculation.

sample1

nucl

all

Codon counts

A-sites counts

P-sites counts

E-sites counts

A-sites per codon

P-sites per codon

E-sites per codon